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Abstract 

The  notion  of  a  conditional  linear  predictor  is  used  as  a 
distribution- free  method  for  eliminating  the  individual-specific  effects  in  a 
class  of  nonlinear,  unobserved  components  panel  data  models.   The  methodology 
is  applied  to  a  general  count  model,  which  allows  for  individual  dispersion 
in  addition  to  an  individual  mean  effect.   As  a  corollary  of  the  general 
results,  the  multinomial  quasi-conditional  maximum  likelihood  estimator  is 
shown  to  be  consistent  and  asymptotically  normal  when  only  the  first  two 
moments  in  the  unobserved  effects  model  have  been  correctly  specified.   This 
has  important  implications  for  analyzing  count  data  in  panel  contexts. 
Simple,  robust  specification  tests  for  this  class  of  count  models  are  also 
developed.   A  second  example  covers  the  case  where  the  variance  is 
proportional  to  the  square  of  the  mean,  encompassing  unobserved  component 
gamma  regression  models  for  panel  data.   Models  with  serial  correlation  are 
briefly  discussed. 


1.  Introduction 

In  the  standard  linear  unobserved  effects  model,  it  is  well  knovm  that 
consistent  estimators  are  available  under  correct  specification  of  the 
conditional  mean  and  strict  exogeneity  of  the  explanatory  variables, 
conditional  on  the  latent  individual  effect.   The  usual  fixed  effects 
(within)  estimator  is  consistent,  as  is  the  minimum  chi-square  estimator 
proposed  by  Chamberlain  (1982) . 

To  be  more  precise,  let  (  (y.  ,x.  ,<^  .  )  :  i=l ,  2  ,  .  .  .  )  be  a  sequence  of 
independent,  identically  distributed  random  variables,  where  y.  = 

(y.^ y--p)'  is  Txl ,  X.  =    (x'  ,  .  .  .  ,x'  )'  is  TxK,  and  </> .  is  the  scalar 

unobserved  effect.   The  linear  unobserved  effects  model  specifies  that,  for 
each  t=l, . . . ,T, 

E(y.  Ix.,^.)  =  E(y.  Ix.  ,<f>.)    -  <f> .    +  k.    /S    .  (1.1) 

•'it'  1^1^     ^-'it'  it^i     ^x  it^o 

where  p  is  a  Kxl  vector  of  unknown  parameters.  Equation  (1.1)  incorporates 
a  linearity  assumption  and  strict  exogeneity  of  x.  conditional  on  the  latent 
variable  4> .  ;    see  Chamberlain  (1984)  for  further  discussion.   Even  though 

2 

additional  assumptions  --  in  particular,  y(y.\x.,4>.)    =  a   1     --    are  typically 
imposed  in  carrying  out  inference  after  fixed  effects  estimation,  assumption 
(1.1)  and  standard  regularity  conditions  are  sufficient  for  the  fixed  effects 
estimator  to  be  consistent  and  asymptotically  normally  distributed.   Thus, 
the  fixed  effects  estimator  is  robust  to  conditional  heteroskedasticity 
across  individuals,  as  well  as  to  serial  correlation  across  time  for  a 
particular  individual. 

As  noted  by  Chamberlain  (1980),  the  fixed  effects  estimator  is  also  the 
conditional  maximum  likelihood  estimator  (CMLE)   under  the  additional 


assumption 


yilx..^,  -NC^^j^+x.^^.a^I^).  (1.3) 


where  j  ■  (1,1,...,!)'  is  Txl .   The  fixed  effects  estimator  of  /9   can  be 

shown  to  maximize  the  log- likelihood  based  on  the  density  of  y.  conditional 

T 
o'^  Z  y-  .  X-  ■  and  4> .  ,    which  turns  out  to  be  independent  of  <;i.  .   The 

consistency  of  the  fixed  effects  estimator  under  (1.1)  can  therefore  be 
interpreted  as  an  important  robustness  property  of  the  quasi-conditional 
maximum  likelihood  estimator  (QCMLE)  of  P    . 

Unlike  linear  models,  little  has  been  written  on  distribution- free 
estimation  of  nonlinear  unobserved  effects  models.   Chamberlain  (1980,1984) 
analyzes  a  fixed  effects  logit  model,  Hausman,  Hall,  and  Griliches  (1984) 
(hereafter,  HHG)  consider  a  variety  of  unobserved  effects  models  for  count 
panel  data,  and  Papke  (1989)  estimates  count  models  for  firm  births  for  a 
panel  of  states.   All  of  these  applications  rely  on  the  method  of  conditional 
maximum  likelihood,  where  a  sufficient  statistic  (the  sum  of  the  binary  or 
count  variable  across  time)  is  conditioned  on  to  remove  the  unobserved 
effect.   As  far  as  I  know,  the  robustness  properties  of  these  CMLEs  to 
misspecif ication  of  the  initially  specified  joint  distribution  have  not  been 
investigated.   It  is  important  to  see  that,  even  though  the  resulting 
conditional  density  (e.g.  the  multinomial)  is  typically  in  the  linear 
exponential  family  (LEF) ,  the  robustness  of  the  QCMLE  in  the  LEF,  e.g. 
Gourieroux,  Monfort,  and  Trognon  (1984)  (hereafter,  GMT),  cannot  be  appealed 
to.   This  is  because,  except  in  special  cases,  the  expectation  associated 
with  the  LEF  conditional  density  is  misspecified  if  the  initial  joint 
distribution  in  the  unobserved  components  model  is  misspecified. 

For  models  of  nonnegative  variables  --  in  particular  models  for  count 


data  --  it  would  be  useful  to  have  a  class  of  estimators  that  requires 
mininial  distributional  assumptions,  while  further  relaxing  the  first  two 
moment  assumptions  appearing  in  the  literature.   The  conditional  MLE  approach 
is  inherently  limited  by  its  reliance  on  a  completely  specified  joint 
distribution.   A  new,  distribution- free  approach  that  nevertheless  eliminates 
the  unobserved  effects  for  a  broad  class  of  models  is  needed.   This  paper 
develops  the  notion  of  a  conditional  linear  predictor  (CLP) ,  and  shows  how 
CLPs  can  be  used  to  eliminate  individual  effects  in  certain  nonlinear 
unobserved  effects  models. 

Section  2  introduces  the  model  that  motivated  this  research,  and  shows 
how  the  unobserved  effects  can  be  removed  by  computing  an  appropriate  CLP. 
The  conditional  mean  and  variance  assumptions  are  substantially  more  general 
than  those  implied  by  the  most  flexible  negative  binomial  specification  of 
HHG.   In  particular,  the  model  allows  not  only  for  an  individual  effect  in 
the  mean,  but  it  also  allows  for  individual  under-  or  overdispersion  that  can 
be  unrelated  to  the  mean  effect.   Independence  across  time  is  not  assumed, 
and  moments  higher  than  the  second  are  unrestricted. 

Estimation  of  conditional  linear  predictors,  which  is  a  straightforward 
application  of  generalized  method  of  moments  (Hansen  (1982)),  is  covered  in 
section  3.   Section  4  discusses  specification  testing  in  the  context  of  CLPs. 
Section  5  analyzes  the  model  of  section  2  in  detail,  and  suggests  several 
consistent  and  asymptotically  normal  estimators.   In  particular,  the 
multinomial  QCMLE  used  by  HHG  for  the  fixed  effects  Poisson  model  is  shown  to 
be  consistent  and  asymptotically  normal  much  more  generally. 

Section  6  briefly  covers  a  multiplicative  unobserved  effects  model  where 
the  conditional  variance  is  proportial  to  the  square  of  the  conditional  mean, 


as  occurs  in  gamma  and  lognormal  regression  models.   Section  7  outlines  how 
serial  correlation,  conditional  on  the  unobserved  effects,  can  be 
accomodated. 

2.  Motivation:  An  Unobserved  Effects  Model  for  Count  Panel  Data 


whe 


Let  ( (y.  ,x.  ,<^. )  :  i-1 ,  2  ,  .  .  .  )  be  a  sequence  of  i.i.d.  random  variables, 
re  V.  «  (v. , , . . . , v.„)'  is  an  observable  Txl  vector  of  counts,  x.  = 
(x'  ,x'  , . . . ,x'  )'  is  a  TxK  matrix  of  observable  conditioning  variables  (x. 
is  IxK ,  t-l,...,T),  and  <i> .    is  an  unobservable  random  scalar.   The  fixed 
effects  Poisson  (FEP)  model  analyzed  by  HHG  assumes  that,  for  t-1 T, 

^it'^'i-'^i  ~  Poisson(^^/i(x^^,^^))  (2.1) 

and 

y.  ,y.  are  independent  conditional  on  yi.,4>.,  t  ?^  r,  (2.2) 
where 

E(y.Jx.,^.)  =  E(y.^|x.^,^.)  =  ^.m(x,^.^^).  (2.3) 

and  y3   is  a  Pxl  vector  of  unknown  parameters.   Actually,  HHG  take  n(x       P)    = 

exp(x.  yS),  but  there  is  no  need  to  use  this  particular  functional  form. 

However,  it  is  convenient  to  choose  fi   so  that  ^(x.  ,P)    is  well-defined  and 

positive  for  all  x.   and  fi.      Assumptions  (2.1)  and  (2.2)  incorporate  strict 

exogeneity  of  x.  conditional  on  6.,    independence  of  y.   and  y.   conditional 

on  X.  and  <f> .  ,    and  the  Poisson  distributional  assumption. 

If  a  particular  functional  form  for  E((i.lx.)  is  specified,  then 

1 '  1 

estimation  of  p     can  proceed  under  (2.3)  only;  further  assumptions  on 
D(y .  |x.  ,ijii . )  are  not  required.   Equation  (2.3),  a  model  for  E(4>.\x.),    and  the 
law  of  iterated  expectations  can  be  used  to  obtain  E(y.|x.)  as  a  function  of 
0      and  other  parameters.    For  example,  if  /i(x.  ,/3)  is  specified  to  be 


exp(x   /3)  ,  one  might  also  assume  that 


EC'J^^lx^)  -  exp 


»   +   y  X.  A 
o    ^,  It  ot 
t-1 


for  Kxl  vectors  X       ,    t-l,...,T.   However,  in  many  cases  one  does  not  wish  to 
ot 

be  so  precise  about  how  6.    and  x.  are  related. 

As  an  alternative  to  specifying  E(<f>.\x.)    (or  D((^.|x.)),  HHG  show  how 

(2.1)  and  (2.2)  can  be  used  in  Andersen's  (1970)  conditional  ML  methodology 

T 
(see  also  Palmgren  (1981)  for  the  following  derivation).   Let  n.  =  Y,  Y ■ 

^         t-1  ^^ 

denote  the  sum  across  time  of  the  explained  variable.   Then  HHG  show  that 


y^|n^,x^,,?S^  -  Multinomial(n^,p^(x^,/3^) p^(x^,^^)) 


(2.4) 


where 

P,(x,./3^)  .Mx,^,/3^)/[Zm(x,^,/^,)].  (2.5) 

r=l 

Because  this  distribution  does  not  depend  on  (f> .  ,    (2.4)  is  also  the 

distribution  of  y.  conditional  on  n.    and  x. .   Therefore,  B      can  be  estimated 

■^1  11  ,0 

by  standard  conditional  MLE  techniques.   For  later  use,  the  conditional 
log- likelihood  for  observation  i,  apart  from  terms  not  depending  on  /3,  is 


I  (P)   =  I   y.  log[p  (X  ,^) 
t=l 


(2.6) 


Because  the  multinomial  distribution  is  in  the  LEF,  the  results  of  GMT 
imply  a  certain  amount  of  robustness  of  the  QCMLE.   If  (2.4)  holds  then 


E(y.  |n.,x.)  =  p  (x. ,5  )n. . 
■^it'  i'  1^    *^t   1^0   1 


(2.7) 


Conversely,  it  follows  by  GMT  that,  if  (2.7)  holds,  then  the  QCMLE  is 
consistent  and  asymptotically  normal,  even  if  the  multinomial  distribution  is 
misspecified.   Other  than  the  FEP  model  (2.1)  and  (2.2),  there  is  at  least 
one  other  interesting  case  where  (2.7)  holds.   Let  a.    and  7.  be  unobserved 


individual  effects.   If 

y.  |x.,Q.,7.  -  Necative  Binomial  (q  .ii(x.  ,B    ),-y.)  (2.8) 

■'it'  1   11      ^  1    it^o    1 

and 

y.  ,y.   are  independent  conditional  on  x.,q.,7.,  t  >^  r,       (2.9) 
■'it'-'ir         ^  i'  i'  'i' 

then  (2.7)  can  be  shown  to  hold.   By  GMT,  the  QCMLE  based  on  the  multinomial 

distribution  provides  consistent  estimates  of  P      under  (2.8)  and  (2.9).   This 

is  useful  but  still  somewhat  restrictive. 

A  robust  approach  consists  of  specifying  at  most  a  couple  of  low  order 

conditional  moments.   Let  d) .    and  (p.    be  scalar  unobserved  effects.   A  strictly 

11  ■' 

weaker  set  of  assumptions  than  (2.1) -(2. 2)  and  (2. 8) -(2. 9)  is 

E(y.^|x..<^..^.)  -=  '?^iM(x.^,^^)  (2.10) 

V(yiti'^i'^i''Pi)  =  ■Pi^^yitl^'i'^i'^'i^  °  "Pi^i^^^it'^o^     ^^-^^^ 
CV(y^^,y^^|x^,<^.,^.)  =0,   t  ^  r.  (2.12) 

Equations  (2 . 10) - (2 .  12)  specify  the  first  two  moments  of  y.  conditional  on  x. 

and  4> .  ,    and  these  are  more  general  than  the  first  two  moments  implied  by 

(2.1) -(2. 2)  (cp.  =   1)  and  as  general  as  the  first  two  moments  implied  by 

(2.8)-(2.9)  i4>.    ^  a./y.,    cp.    =   I  +   I/7.).   Although  (2.12)  assumes  zero 

conditional  covariance,  independence  of  the  components  of  y.  conditional  on 

X.  ,  (^ .  ,  and  cp .    is  not  assumed,  nor  is  the  distribution  assumed  to  be  Poisson, 
1    1       1  • 

Negative  Binomial,  or  anything  in  particular. 

The  primary  question  addressed  in  this  paper  is:   In  models  such  as 

(2 .  10)  -  (2 .  12)  ,  how  can  <f> .    and  ip.    be  eliminated,  so  that  B     can  be  estimated? 

11  o 

One  answer  is  really  very  simple.   Define  the  sum  of  counts,  n. ,  as  above. 
Then,  as  defined  in  section  3,  the  linear  predictor  of  y.   on  (l,n.)' , 
conditional  on  (x.  ,(f> .  ,ip.)  ,    is  given  by 


L(y^^|l.n^;x^,^^.<p^)  -  E(y^  Jx^  ,^^  ,<,.  ) 


+  [n^  -  E(n^|x^,(j!.^,(p^)J 


V(n.|x.,^.,<p.) 


E(y,,|x.,^..<p.) 


v(y.j.|x.,.^.,<p.) 

V(n.  Ix.  ,4> .  ,(p.) 
1 '  111 

i'^   it^o     T  ^1     "^^  1    XX      o   ■> 


y  <p.(j!>.^(x.  ,/3  ) 
^,11    ir  '^o 
r=l 


r-1 


n..  (2.13) 


T  i' 


I  m(x.  ,^  ) 
r=l 

There  are  a  few  points  worth  noting  about  this  derivation.   First,  (2.13)  is 

generally  not  the  conditional  expectation  E(y.  |  n.  ,x.  ,  (ji .  ,!p.  )  ,  as  was  derived 

under  (2.1)  and  (2.2)  or  (2.8)  and  (2.9).   Thus,  a  class  of  estimators  must 

be  constructed  to  account  for  the  fact  that  (2.13)  represents 

L(y.  ^  1 1  ,n.  ;x.  ,(i.  ,(p. )  ,  but  not  necessarily  E(y.  I  n.  ,x.  ,(i.  ,(p. )  .   Second,  as  is 
•'it'    i'  I'^i'^i''  ^      ^-^it'  1   111 

desired,  this  conditional  linear  predictor  does  not  depend  on  (j) .    or  <p.  . 

Third,  in  this  example,  L(y.  1 1  ,n.  ;x.  ,(i.  ,(p. )  =  L(y.  I  n.  ;x.  ,ii.  ,(p. )  ,  so  that 

f      '^it'  '  i'  I'^i'^i^     ^-^it'  i'  I'^i'^i 

unity  could  have  been  excluded  from  the  projection  set.   However,  knowledge 
of  this  equality  expands  the  type  of  orthogonality  conditions  that  can  be 
used  in  estimating  ^    ,    and  leads  directly  to  the  robustness  result  for  the 
multinomial  QCMLE. 

I  return  to  this  example  in  section  5.   The  next  two  sections  cover 
estimating  and  specification  testing  in  the  context  of  conditional  linear 


predictors . 

3.  Estimating  Conditional  Linear  Predictors 

This  section  defines  and  discusses  estimation  of  conditional  linear 
predictors.   Unsurprisingly,  intuition  about  linear  predictors  in  an 
unconditional  setting  generally  carries  over  to  the  conditional  case.   Let  y 
be  Jxl ,  z  be  Kxl ,  and  w  be  Ixl.   In  what  follows,  z  may  or  may  not  contain 
unity  as  one  of  its  elements.   This  distinction  turns  out  to  be  important  in 
the  applications.   In  section  5,  unity  can  and  should  be  included  in  z ;  in 
the  section  6  example,  unity  must  be  excluded  from  z. 

Subsequently,  without  stating  it  explicitly,  an  expectation  is  assumed 
to  exist  whenever  it  is  written  down.   Define  the  following  conditional 
moments : 

Zy^(w)  =  E(yz'  |w).  S^^(w)  -  E(zz'  |w).  (3.1) 

Assume  that  S   (w)  is  positive  definite  with  probability  one  (w.p.l.).   The 
following  definition  holds  only  w.p.l.,  but  this  is  left  implicit  throughout. 

DEFINITION  3.1:   Let  y,  z,  and  w  be  defined  as  as  above.   The  linear 
predictor  of  y  on   z,  conditional   on   w,  is  defined  to  be 

L(y|z;w)  -  r^^(w)2^^(w)z  (3.2) 

-   C^(w)z. 

where  C  (w)  is  the  J  x  K  matrix 
o 

C  (w)  ^  Z   (w)2"^(w) .   ■ 
o       yz     zz 

Note  that  L(y|z;w)  is  always  linear  in  z,  but  is  generally  a  nonlinear 
function  of  w.   When  the  context  is  clear,  L(y|z;w)  is  simply  called  a 


conditional    linear  predictor    (CLP).   The  difference  between  y  and  its  CLP  has 
zero  orthogonality  properties  that  are  immediate  extensions  from 
unconditional  linear  predictor  theory. 

LEMMA  3.1:   Let  y,  z,  and  w  be  as  in  Definition  3.1.   Define 

u  E  y  -  L(y|z;w)  -  y  -  C  (w)z.  (3.3) 

Then 

E(uz'  |w)  -  0.  (3.4) 

PROOF:   uz'  -  [y  -  C  (w)z]z'  -  yz'  -  C  (w)zz'  ,  so  that 

E(uz'  |w)  =  E(yz'  |w)  -  C  (w)E(zz'  |w) 

=  11   (w)  -  S   (w)e"^(w)S   (w) 
yz       yz     zz     zz 

=  0.   ■ 

The  next  corollary,  which  motivates  the  class  of  estimators  considered, 
follows  immediately  by  the  law  of  iterated  expectations. 

COROLLARY  3.1:   Let  y,  z,  w,  and  u  be  as  in  Lemma  3.1,  and  let  D(w)  be 
a  JK  X  L  random  matrix.   Then 

E[D(w)'  (z  ®  I  )u]  =  E[D(w)'vec{uz'  )]  =  0.   ■  (3.5) 

Suppose  now  that  C  (w)  =  C(v,8    ),  where  C(v,8)    is  a  known  function  of  w 
and  the  Pxl  parameter  vector  ^  e  6.   Then,  for  a  matrix  function  D(w)  as 
defined  in  Corollary  3.1,  8      solves  (perhaps  not  uniquely)  the  system  of 
equations 

E[D(w)'(z  0  I^)(y  -  C(w,B))]    =  0.  (3.6) 

Equation  (3.6)  can  be  exploited  to  obtain  a  variety  of  consistent  estimators 

of   8    . 
o 

For  the  remainder  of  this  section,  let  { (y . , z . ,w. ) : i=l , 2 , . . . )  be  an 


i.i.d  sequence,  where  y.  is  Jxl,  z,  is  Kxl ,  and  w.  is  Ixl.   Extension  of  the 

1  i  1 

subsequent  results  to  heterogeneous  and/or  dependent  situations  is  fairly 
straightforward  but  notationally  cumbersome.   The  available  sample  size  is 
denoted  N. 

Assume  that  for  a  known  function  C(w.,^), 

L(yJZi;w.)  -  C(w.,^^)z..  (3.7) 

In  the  applications,  not  all  of  the  vector  w.  is  observed  (w.  ^  (x.,4>.,ip.)    in 

t-r  ^  111    1 

(2 . 10) - (2 . 12) ) ,  and  C(w.,6    )  does  not  depend  on  the  unobserved  elements.   For 

1   o 

notational  simplicity,  this  section  treats  w.  as  entirely  observed.   When  w. 

1  -^  1 

contains  unobservables ,  the  orthogonality  conditions  constructed  below  are 
necessarily  restricted  to  functions  of  the  observables . 

The  class  of  estimators  is  assumed  to  solve  a  first  order  condition 
asymptotically.   To  specify  the  estimating  equations,  let  D(w.,^,7)  be  a  JKxP 
matrix  depending  on  w. ,  6,    and  possibly  a  vector  of  nuisance  parameters,  7  e 

A 

r.   Assume  that  an  estimator  7.,  is  available  such  that 

N 

/N(7„  -  7  )  =  0  (1)  for  some  -y     G  T .  (3.8) 

N  p 

Then,  9      is  assumed  to  satisfy 

i=l  ^ 

in  shorthand, 

1=1  ^ 

A 

where  u.{e)    =  y.  -  C(w.,e)z..   As  further  shorthand,  let  u.  =  u. (^  ),  D.  = 
1-^1      11  •       1     1^01 

A       A  A  A  A 

D.(fi   7  ),  and  u.  =  u.(S^J.       In  all  of  the  examples  in  this  paper,  5_,  is  an 
iNN         iiN  N 

exact  solution  to  the  P  equations 
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N 

I  D.(^7j^)'  [z.  ®  Ij]"i(^)  -  0,  (3.10) 
i-1 

so  that  (3.9)  is  trivially  satisfied. 

A 

The  weak  consistency  of  8,,   for  6     hinges  on  a  standard  uniform  weak  law 

N      o 

of  large  numbers  and  a  suitable  identification  condition.   Identification 
requires  that  6      is  the  only  element  of 

y  i=  15  e  6:  E[D.  (i9,7*)'  (z.  ®  I  )u.(e)]    =  0).  (3.11) 

By  Corollary  3.1,  6      is  an  element  of  if;    as  usual,  this  must  be  strengthened 
to  the  assumption  that  8      is  the  unique  solution  to  the  asymptotic 
orthogonality  condition. 

A 

Establishing  the  asymptotic  normality  of  /N(5  -    8    )    is   also  relatively 

straightforward,  but  a  little  algebraic  care  is  required  to  show  that  the 

natural  estimate  of  the  asymptotic  variance  matrix  of  /N(5„  -    8    )    is   valid. 

•^  No 

The  slight  complication  arises  because  E(u.lw.,z.)  ^   0  necessarily;  (3.7) 
guarantees  only  that  E(-u.z'.  |w.)  =  0. 

A 

The  first  step  in  deriving  the  as}Tnptotic  distribution  of  /N(5  -    8    )    is 
standard;  it  amounts  to  showing  that  the  asymptotic  distribution  of 
/N(5   -  8    )  does  not  depend  on  that  of  /N(7   -  7  ).   This  follows  by  a 
mean  value  expansion: 

1=1  1=1 

*  ^       -A- 

+  E[  (u'.  {z'.  ®  I  )  ®  I  )3vec(D.  (8     ,7  )'  )/a7]/N(7„  -  7  )  +  o  (1)  . 

II  J     p       1   o  N  p 


But 


so  that 


E[  (u'.  (z'.  ®  I  )  Iw.  ]  =  0, 
11     J    1 


11 


^"''il'i^'N-^N)'[^i«^.]^(^> 


N 
■1/2 


-  N"^'^  I   D.(5^,7  )'  [z.  ®  Ij]u.((?j^)  +  o  (1).         (3.12) 
i-1 

Therefore,  define  the  Pxl  "score"  vector 

s.(^)  -  D.(5,7*)' [z^  ®  Ij]^i(^).  (3.13) 

so  that  E[s.(^  )|w.]  -  0  by  Corollary  3.1.   Another  mean  value  expansion 

gives 

N 

-1/2 

O 


(1)  -  N"^'^  I   s.(5^)   +   E[H.(^^)]/N(^^  -  e^), 
i=l 

where  11.(8)    =  V„s.(e)    is  the  PxP  derivative  of  s.(9).      Provided  that 

A  ^  -E[H.(i9  )]  is  nonsingular,  (3.14) 

A 

/N(^      -    S    )   has   a   familiar   asymptotic   representation: 

1=1 

Letting 


B     =  E[s. {&    )s.{e    )'  ] 

O  "^     1       O        1       o 


=    E[D.(e     ,7*)'[z.    ©   I   ]u.u'.[z'.    ®   I   ]T>.{e     ,7*)],  (3.15) 

lO  'l  J-^ll'-L  JIO 


it  follows  that 


/N(^„  -    9    )    ~  N(0,A     B   A  ).  (3.16) 

No  o   o  o 


This  discussion  is  summarized  with  an  informal  theorer 


12 


THEOREM  3.1:   Under  (3.7),  (3.8),  (3.9),  (3.11),  (3.U),  and  standard 
regularity  conditions,  (3.16)  holds.   " 


The  matrix  B   is  easily  estimated  by  a  standard  outer  product  of  the 


*'  C<  ^  ^>- A  >< 


score 

N 


^N^^"\^/i(''N-^N^^^^-V'  (3-1^) 


i-1 
N 


A  A 


-  N'-*"  y  D.  (^„,7v,)'  [z.  ®  I  ]u.  (e,Ju.  (^^J'  [z'.  ®  I  ]D.  (^„,7v.) 
.■^,  1   N  'N^  '  1     j^  1   N   1   N   ^1     j'    X      N'  'N 
1-1 

-1    iN    A  A    A  A 

s  n"   y  D'.  [z.  ®  I  lu.u:  [z-.  0  I  id.  ,  (3.18) 

.■^,1^1     J^ll^L     J^i'  ^      ^ 

1-1 

which  is  at  least  positive  semi-definite.   The  most  convenient  estimator  of 

A  excludes  terms  depending  on  the  derivative  of  D.(^,7  )  with  respect  to  6. 

But 

E.(e)    =    -D.(^,7*)'  [z.  ®  I  ]  [z'.  ®  I  ]VC.($) 
1         1    '    '■  1     j-^  ^  1     J   ^  1 

+  {u.(,e)'  ®i){(z'.  ®i)®i  )avec[ 0.(^,7^)'  ]/de , 

1  P      1      J       P    "       1 

where  V„C.(^)  ^  dvec[C .  (e)]/d6    is  JK  x  P  and  avec[D.  (5  ,7''")'  ]/a^  is  JKP  x  P. 
oil  1 


Therefore, 


Under    (3.7)  , 


-H    (^    )    =D    (5      7*)'[z.    ®I][z'    ®  I  ]V   C    (5    )  (3.19) 

L         O  10  1  Jl  JplO 

-    (u.  (6    )'  (z'.    ®  I  )    ®   I   )avec[D.  (e    ,-y*)'  ]/de . 
10  1  J  p  10 


E[u.(^    )'  (z'.    ®   I   )  Iw.  ]    =   0 
101  J    '    1 


and,  because  avec[D.(e  ,7  )']/d6    depends  only  on  w. , 

A  =  EfTi.(6    ,7*)'[z.z'.  ®  I  ]V  C.(5  )"]  .  (3.20) 

o   ^10^11   j'  e  1.    o  ■' 

It  follows  that  a  consistent  estimator  of  A   is  simply 
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N  . 

I 

i-1 


\-  ''■\^,^'i'^n®  ^a^^^^i-  (^-^^^ 


Inference  is  carried  out  on  B      by  treating  5^,  as  normally  distributed  with 

mean  ^   and  variance 
o 

V^r^>-'  (3.22) 

Several  special  cases  can  be  cast  in  terms  of  (3.9).   One  useful  class 
of  estimators  is  multivariate  weighted  nonlinear  least  squares  (MWNLS). 

A  A 

Given  a  JxJ  symmetric,  positive  semidefinite  matrix  G(w.,7.,),  choose  ^.,  to 

solve 

N 
min  I      [y   -  C(w  , 5 )z  ] '  G(w   7  ) [y   -  C(w  ,^)z  ].  (3.23) 

^ee  i=l  ^     1  :n   1      11 

Note  that  the  weighting  matrix  G(w.,7)  is  allowed  to  depend  on  w.  but  not  on 

A  ' 

z  .   Here,  7   is  an  initial  estimator.   The  MWNLS  estimator  falls  under 
Theorem  3 . 1  by  choosing 

D(w.,^,7)'  -  [V^C^(w.,^)'G(w^,7)l  •■■  |V^Cj^(w.,5)'G(w^,7)],   (3.24) 

where  V  C(w.,^)  has  been  partitioned  as 

V^C(w.,&)'    -    [V^C^(w.,^)'  I     ■•■     |V^C^(w.,^)']. 


and  ^^C^(w^,e)    is   J    x   P,    k=l K.       Then 


D(w    ,5,7)'[z      ®   l]n{e)    =VC(w    ,5)'[z.    ®G(w      7)]u.(e) 

-L  IJl  (71  J.  11 

=   V    C(w.  ,5)'  [z.    0   I   ]G(w.  ,7)u. (5)  . 

In  terms  of  more  familiar  notation,  let 

m.(e)  s  m(w.,z.,fi)  =  C(w.,5)z.  (3.25) 

1         11         11 

denote  the  "regression"  function  (but  recall  that  m(w.,z.,5  )  7^   E(y.|w.,z.) 


necessarily) .   Then 
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so  that 


and 


M.(^)  -  V^m^CfJ)  -  [z'.  ®  l^]V^C^(n,  (3.26) 


s.(^  )  -  M.(fi  )'G  (7*)u  (3.27) 

10        10     1        1 


A   -  E[M.(5  )'G. (7*)M.(5  )] .  (3.28) 

O         1   O     1       1   o 


The  consistent  estimators  of  A   and  B   are  simply 

o       o 


1-1  1-1 


A   ^  N   y  V  C'.  [z.  ®  I  ]G.[z'.  ®  I  ]V  n.  -  N"   X  M'.G.M.         (3.29) 
N       .^,^1^1     J^i^i     J^^l       .'^,111 


and 


A  -INaaAAAA  -INaA 

B„  =  n"   y  M'.G.u.u'.G.M.  =  N"   y  s.s'.  ,  (3.30) 

N       .■^.,111111       ."^^  11 
1=1  1=1 

which  are  the  familiar  robust  formulas  from  MWNLS  theory  when  estimating 

conditional  expectations.   Thus,  even  when  estimating  conditional  linear 

predictors,  simple  positive  definite  estimates  of  A  and  B   are  available. 

0         0 

Only  rarely  does  it  happen  that 

E(u.u'.lw.,z.)  =  [G(w^,7*)]'\  (3.31) 

A  A 

in  which  case  A  =  B   and  either  A,,  /N  or  B„  /N  can  be  used  to  estimate  the 
o     o  N        N 

asymptotic  variance  of  6    . 

The  generalized  method  of  moments  estimators  studied  by  Hansen  (1982) 
are  also  covered  by  Theorem  3.1.   Let  L(w.,(/i)  be  a  JK  x  M  matrix  depending 

A  -  ,  A 

on  w.  and  a  nuisance  parameter  <^ .      Assume  that  /N((/>   -  (^  )  =  0  (1).   Let  — 

denote  a  symmetric,  positive  semi-definite  matrix  estimator  such  that 

/N(-  -    :i   )    =  0    (1),  where  H   is  a  symmetric,  positive  definite  matrix.   In 

A 

the  current  context,  a  GMM  estimator  8^,   solves 

N 

min  T    (8) 
8ee 
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where 


r^(n 


N"'  I   L(w^,^^)'  [z.  ®  Ij]u.(fi)  E^   N"'  I   L(w.  ,.^j^)'  [z^   ®  Ij]^^(B) 
i-1  ■'         *-   i-1 


(3.32) 


Under  differentiability  assumptions,  this  estimator  can  be  shown  to  be  a 


special  case  of  Theorem  3.1.   The  first  order  condition  solved  by  9      is 

y  L'.  [z.z'.  ®  I  ]VC.(e)      S„   y  L'.  [z.  ®  I  ]u.(^) 
'-i-l  -'    *-i-l 


Letting 


^^ 


N 


N'  y  L.  (,?!.„)'  [z.z'.  ®  I  ]VC.(^^J 
1=1 


(3.33) 


which  is  an  M  X  P  matrix,  6      equivalently  solves 


,-1/2 


N 


1=1 
Therefore,  in  the  notation  of  Theorem  3.1,  let 

D(w.,;^)  .  L(w..^^)-^^. 


(3.34) 


(3.35) 


where  7   =  (iji'  ,  (vech(-  )  ) '  ,  (vec(R  )  ) '  )'  ;  this  choice  of  D  does  not  depend  on 


Although  the  GMM  estimator  is  consistent  and  asymptotically  normal  for  a 

A 

variety  of  weighting  matrices  H  ,  the  efficient  estimator  --  given  the  choice 
of  L(v.,4>    )  --  is  always  available.   This  is  the  minimum  chi-square 

A 

estimator,  obtained  by  choosing  H   to  be  a  consistent  estimator  of 


{E[L(w      4>*)'  [z      ®  I  ]u  u'.  [z'.  ®  I  ]L(w  ,/)]]'^ 

X.  J.       xi  J-      X.  X.  J        -L 


(3.36) 


Subsequently,  r.   is  assumed  to  be  chosen  in  this  way.   This  requires  an 

initial,  consistent  estimator  of  8    ,  such  as  a  MXJNLS  estimator. 

o 
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The  asymptotic  variance  of  /N(^.,  ■    6    ),    where  ^.,  now  denotes  the  minimum 

No  N 

chi-square  estimator,  is  A   -  B   ,  because  A   and  B   are  both  equal  to 
^  o      O  0        0 

*  _*  *         * 
R  '  r,  R  ,  where  E      is  given  by  (3.36)  and 

R*  -  E[L(w.,/)'  [z.z'.  ©  IJV^C(w.,fi^)]. 

A  AAA 

The  asymptotic  variance  of  6      is  estimated  by  (R^H  R^)  /N. 

The  problem  of  estimating  conditional  linear  predictors  also  fits  into 
the  framework  of  Chamberlain  (1987) ,  who  derives  the  efficiency  bound  for 
estimators  derived  from  conditional  moment  restrictions.   For  CLPs , 
the  conditional  moment  restrictions  available  for  estimating  6      are  given  by 

E[(y.  -  C(w^,5^)z.)z'.|w.]  =  0 

or,  in  vector  notation, 

E[(z.  ®  I  )(y.  -  C(v.,9    )z.)|w.]  =  0 

1      Jl         lO   11 

(a  total  of  JK  conditional  moment  restrictions) .   Letting 

2    (w.)    ^   E[(z.    ®   I  )u.-u'.(z'.    ®   I  )|w.] 

0  1  ^1  Jill  J     '     i' 

and 

^f    (w   )    -   E[(z    z;    ®   I  )|w    ]V   C(w    ,^    ), 

01  11  JlPlO 

the  lower  bound  is  obtained  from  Chamberlain  (1987,  equation  (1.11)): 

{E[*  (w.)'S"\w.)*  (w.)]  }"■'". 

0    1     O     1    O    1 

To  achieve  this  bound,  one  can  proceed  as  in  Newey  (1987)  and 
nonparametrically  estimate  2  (w.)  and  E[(z.z'.  ®  I  )|w.].   Although  studying 
this  kind  of  procedure  is  beyond  the  scope  of  this  paper,  the  lower  bound 
calculation  at  least  isolates  which  terms  need  to  be  nonparametrically 
estimated. 
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4.  Specification  Testing 

Specification  tests  can  be  derived  using  the  approach  of  Newey  (1985)  . 
If  A(v .  ,6  ,-y)  is  a  JK  X  Q  matrix  depending  on  w.  ,  8,  and  nuisance  parameters 
7,  a  general  class  of  tests  is  based  on  the  sample  covariance 

i-1 
Under  H_ ,  the  following  expansions  are  easily  seen  to  hold  (for  similar 

reasoning,  see  Wooldridge  (1990)): 

n"''J^^^^-^N'^n)'[^®  ^aJ^(V• 
^      "   *  '        '^ 
=  N"''^  X  A(w.,^j^,7  )'  [z.  ®  I_^]u^(^  )   +   o  (1) 

i=l  P 

N  ^  'i 

=  N"^''^  I  A(w.  ,6  ,7  )'  [z.  ®  I  ]u. 
.■^  '  1  o'  '  '  ■■  1  J-'  1 
1=1 

-  E[A.(^^.7*)'[z.z'.  ®  IJV^C.(^^)]/N(^^^  -  ^^)   +   Op(l) 

N        ^  N        ^ 

-^-"\l^,(S^,y)       -      K'^A^'ls    (e^,-y)      +   0(1) 
1=1  i=l  ^ 

N 

=  N"^'^  I    lA.(e    ,-y'^)    -      Ti.{e     ,7*)A"^K  )'  [z.  ®  I  ]u.   +   o  (1) 
.^,   1  o'  '    '  lo'^oo'-i    j-'i      p^' 

where  A   is  given  by  (3.20), 

¥'.(^^,7''')  -  A(w.  ,^^,7*)'  [2^  ®  I^]u.  , 
and 

K   ^  E[V  n.(^  )'  [z.z'.  ®  I  ]A.(6     ,7*)]. 

O         ^lo'^ll      J^lO 

Thus,  let 

1=1 
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and 


- 1 


^i  -  (Ai  -  D.A^K^)'[z.  ®  Ij]u..   i-1 N  (4.3) 

2 

(note  that  ^   is  a  1  x  0  vector).   A  valid  test  statistic  is  obtained  as  NR 
^  ^  i  ^  u 

-  N  -  SSR  from  the  regression 

A 

1   on   ^.,   i-1 N.  (4.4) 

Under  the  null  hypothesis 

Hq:  E[u.(e^)z'.|w.]  -  0,  (4.5) 

a   2  * 

N  -  SSR  -  Xp, .  provided  there  are  no  redundant  columns  in  A. (5  ,7  )•   As  a 

special  case,  this  procedure  covers  a  robust,  regression-based  Hausman  test 

for  comparing  the  multinomial  QCMLE  and  MNLS  in  the  nonlinear  unobserved 

effects  model  (2 . 10) - (2 . 12)  .   This  test  is  discussed  in  detail  in  section  5. 

If  the  minimum  chi-square  estimator  is  used,  where  the  number  of 

orthogonality  conditions  M  is  greater  than  the  number  of  parameters  P,  then 

the  GMM  overidentif ication  test  is  available  from  Hansen  (1982) .   The  test 

statistic  is  simply  N  times  the  value  of  the  minimum  chi-square  objective 

function.   Under  (4.5), 

A 

where  t    (6)    is  defined  in  (3.32)  with  appropriate  choice  of  E! 

5.  Application  to  Count  Models  with  Individual-Specific  Dispersion 

This  section  applies  the  theory  of  sections  3  and  4  to  the  model 
introduced  in  section  2.   Nothing  of  what  follows  relies  on  y.  being  a  vector 
of  counts,  but  the  example  is  motivated  by  the  count  models  of  HHG .   Let 
{  (y .  ,x.  ,<f> .  ,ip .)  :  i_=l   2  ,  .  .  .  }    be  a  sequence  of  i  .  i  .  d .  random  variables  .   As  in 
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section  2,  y   is  Txl ,  x   is  TxK ,  and  these  are  observed.  4> .    and  ip      are 
unobserved,  scalar  random  variables  representing  the  individual  effects.   For 
clarity,  the  model  introduced  in  section  2  is  reproduced  here.   For 
t-1 T, 

E(yiJx.,^.,<p.)  -  .^./.(x.^,^^)  (5.1) 

■'it'  111      ^1   •'it'  1^1   1      fj^r^f-\  It  '^o 

CV(y.^,y.^|x.,<6.,vp.)  =0,   t  ^  r.  (5.3) 

This  model  allows  for  individual  mean  effects  as  well  as  a  separate, 
individual  dispersion,  with  variance  to  mean  ratio 

V(y.j.|x.,^.,<p.)/E(y.  Jx.,^.,cp.)  =  .p..  (5.4) 

The  addition  of  (p.    allows  for  under-  or  overdispersion,  depending  on  the 

individual.   Assumptions  (5.1),  (5.2),  and  (5.3)  are  more  flexible  than  the 

first  two  moments  of  all  of  the  fixed  effects  models  used  by  HHG.   The  FEP 

model  imposes  cp.  =  1.   The  fixed  effects  negative  binomial  (FENB)  model  of 

HHG  imposes  cp.  =  1  +  c^..   Not  only  is  underdispersion  ruled  out  for  all 

individuals,  but  the  amount  of  overdispersion  is  tied  directly  to  the  mean 

effect.    In  addition,  (5.3)  is  weaker  than  independence,  and  no 

distributional  assumption  is  made. 

Section  2  showed  that  the  linear  predictor  of  y.   on  (l,n.)' , 

^  ■'it    ^  '  1^  ' 

conditional  on  (x.  ,ii .  .cp. )  ,  is  free  of  d) .    and  cp..      In  vector  notation, 
111  1      1 

L(y^|l,n^;x.,^.,cp.)  =  p(x.,^^)n^,  (5.5) 

where  p(x.  ,/S)  denotes  the  Txl  vector  with  t   element 

M(x.^,^) 
Pt(x.,^)  -  -J .  (5.6) 

Im(x.^,^) 
r=l 
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Equation  (5.5)  implies  orthogonality  conditions  of  the  form 

E[D(x.)'  ( (l,n.)'  ®  I  )u.  ]  -  0,  .        (5.7) 

11         Tl 

where  D(x.)  is  any  2T  x  L  matrix  function  of  x.,  and  u.  ^  v.  -  p(x.,^  )n. . 
^i'       •'  1        iiioi 

This  allows  for  a  variety  of  method  of  moments  procedures,  as  well  as  some 
simple,  well  known  estimators.   For  example,  Theorem  3.1  implies  that  the 

A 

MNLS  estimator  fi    ,    which  solves 

N 
min  I    (y   -  p(x  ,^)n  )'(y.  -  p(x  ,^)n  ),  (5.8) 

^GB  i-1         ill      11. 

is  consistent  for  0      and  asymptotically  normally  distributed.   The  asymptotic 

variance  of  /N(5„  -  fl  )  is  A  B  A  ,  and  consistent  estimators  of  A   and  B 
N     o       o  o  o  0       0 

are  given  by 

N 


\-N-\lnW.Vp.  (5.9) 


and 


A,  T  AAA        A 

B^-  =  N   y  nfv^p'.u.u'.V.p.  ,  (5.10) 

N       .^^    1  B^:l    1  1  6^1.' 
1=1 


where  V  p.  =  V  p(x. ,^  )  is  the  TxP  gradient  of  p(x. ,^)  evaluated  at  0    .      Th 

A  A 

is  easily  extended  to  MWNLS  with  TxT  weighting  matrix  G.  =   G(x.,7  ).   The 
MWNLS  estimator  solves 

N 

I 
/3gB   i=l 


is 


min     I    (y.    -    p(x.  ,^)n^ )' G.  (y^    -    p(x^,/3)n.),  (5.11) 


and  A      and   B      are    easily    estimated  by 
o  o  ■'  ■' 


IVi  AAA 


A,,   ^   N        y  nTV^p'.G.V  n.  (5.12) 

1=1 
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and 

A  1^  AAAAA         A 

B^,  -  N   y  n^V„p'.G.u.u'.G.V„p.  .  (5.13) 

N      .'^^    1  B^i    1111  B^r 
1-1 

Even  more  interesting  is  that  the  multinomial  QCMLE  (or  FEP  QCMLE)  is 
consistent  under  only  assumptions  (5.1) -(5. 3).   This  is  remarkable  given  that 
the  conditional  multinomial  distribution  was  derived  from  a  Poisson 
distribution  with  only  one  unobserved  effect,  i.e.  ep.  ^  1  (see  section  2). 
Moreover,  the  distribution  can  be  very  different  from  the  Poisson,  and 
independence  of  the  elements  of  y.  conditional  on  {-x..  ,<f) .  ,ip .)    is  not  assumed. 

To  see  that  the  multinomial  QCMLE  is  covered  by  Theorem  3.1,  note  that 
the  gradient  of  the  quasi-log  likelihood  (see  (2.6))  is 

s,(^)'  -  ^i.(P)   =  I  YiJV  p^(x..^)'/p,(x,,^)] 
^        t=I     ^ 

T 
=  I    [V^P^(x.,^)'/P^(x..^)](y.j.  -  p^(x.,/3)n.)      (5.14) 


-  V^p(x^,^)'W(x.,^)(y.  -  p(x.,^)n.; 


-  V^p(x.,^)'W(x.,y9)u.(^),  (5.15) 

where  W(x.  ,^)  s  [diag{p  (x.  ,^3) p  (x.,;9))]"\   Equation  (5.14)  follows 

T 
from  the  fact  that   ^  p  (x.,^)  s  1  for  all  p.       Because  z'.  =    (l,n.),  (5.15) 

t=l  ^   ^  ^        ^ 

is  seen  to  be  of  the  form  (3.10)  with  D.(/3)'  ^  [V  p.  (/3)' W.  (^)  |  O]  .   If  the 

A 

FEP  model  is  maintained,  the  estimate  of  the  asymptotic  variance  of  y9  , 
obtained  from  the  estimated  information  matrix,  is 
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-1 


y  V  m'.v'V„m. 

■  T  /3  L  1  ;3  1 
1-1        f-   ^^ 


-1 


(5.16) 


where  m.  (/3)  =  p.(^)n.  would  normally  be  the  conditional  mean  function 

A  A  A 

associated  with  the  multinomial  distribution,  V^m.  s  V  .m.  (iS.,)  =   V  p .  (/3,_,)n.  , 

/9  1    ^  1   N     ^1   N   1 

A  .^  A  A 

and  V^  ^   V(n^,x^,;3^)  -  diag{p^^  (;3^)n^ PiT^^N''"i'"   Expression  (5.16)  is 

familiar  from  standard  likelihood  theory  involving  the  multinomial 
distribution. 

Unless  the  original  Poisson  model  holds,  A  /N  produces  inappropriate 

AAA 

Standard  errors.   The  robust  form  is  A  B^  /N,  where 


A      A    A 


\   -  N-\I  V  p(x.,^^)'W(x.,^^)u.u'.W(x..^^)V  p(x.  ,^^ 
1=1 


IN  A    A      A    A     A  A 

=  N'^  y  V„m'.v"^u.u'.v"V  m.  . 

.-^^ySll   111   /3i 

1=1 


(5.17) 


The  estimator  A^  ^vAm  /^  is  robust  to  arbitrary  serial  correlation  in 

{u.  :t=l,2, . . . ,T) .   Note  that,  by  definition,   }]  u.   s  0,  so  that  the  u. 

t=l 

might  generally  be  expected  to  exhibit  negative  serial  correlation.   This  is 

the  case  under  (2.1)  and  (2.2).   From  McCullagh  and  Nelder  (1989,  p.  165)  the 

correlation  between  u.   and  u.  ,  conditional  on  (n.,x.),  is 

It      ir'  ^  i'  -l' ' 


•Pit^^o^Pir^^o^/fPit^^o^'l-Pit^^o^'Pir^^o^'^-Pit^^o) 


1/2 


(5.18) 


This  particular  negative  correlation,  which  is  used  implicitly  in  the 

^  A 

estimator  A^  /N  of  the  asymptotic  variance  of  /3    need  no  longer  hold  under 
(5.1)-(5.3).   In  fact,  it  is  no  longer  possible  to  compute  the  correlation 
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between  u.   and  u.  ,  conditional  on  (n.,x.),  under  assumptions  (5.1)-(5.3). 
It      ir'  11  ^ 

Thus,  the  robust  covariance  matrix  estimator  should  always  be  computed;  this 

can  produce  standard  errors  smaller  or  larger  than  those  obtained  from 

(5.16). 

The  robustness  of  the  QCMLE  to  distributional  misspecif ication  suggests 

a  research  methodology  different  from  that  used  by  HHG.   They  compute  a 

specification  test  for  the  FEP  model  that  checks  whether  the  data  support  the 

serial  correlation  pattern  (5.18).   HHG  properly  view  a  violation  of  (5.18) 

as  a  rejection  of  the  original  Poisson  specification.   However,  this  section 

has  shown  that  the  estimates  of  B      in  model  (5.1) -(5. 3)  are  still  consistent 

o 

and  asymptotically  normal,  whether  or  not  the  correlation  structure  (5.18) 
holds.   Rather  than  testing  whether  the  multinomial  QCMLE  estimates  are 
consistent  for  p    ,  the  HHG  test  looks  for  departures  from  the  multinomial 
distributional  assumption.   A  test  of  model  (5.1) -(5. 3)  should  be  based  on 
the  testable  implication  that  the  linear  predictor  of  y.  on  (l,n.)' , 
conditional  on  {-x..  ,<i> .  ,(p.)  ,    is  of  the  form  (5.5).   Because  QCMLE  and  MNLS  are 
both  consistent  for  fi     under  (5.5),  a  robust  form  of  Hausman's  (1978)  test 
for  comparing  the  two  estimators  is  natural.   Here  I  focus  on  a  regression 
form  of  the  test  that  requires  computation  of  the  QCMLE  only,  and  results  in 
a  particularly  simple  research  methodology. 

The  regression-based  Hausman  test  is  a  special  case  of  the  tests 
discussed  in  section  4,  but  it  is  more  directly  obtained  from  Wooldridge 
(1991).  Because  the  Poisson  model  is  the  nominal  distribution  for  count  data, 
it  makes  sense  to  construct  the  robust  test  to  be  optimal  if  the  Poisson 
model  is  true,  and  the  tests  in  Wooldridge  (1991)  are  constructed  in  this 

AAA  A 

manner.   Let  u.  ,  V  m    and  V.  be  defined  as  above,  evaluated  at  the  QCMLE  i3,, . 
1    ;3  1        1  '^N 
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-1/2  „    -  „-!/' 


Define    the    weichted   quantities    u.    -   V.     'u.,    V^m,    -   V.       V   m . ,    and   A.    ■ 
'^  ^  iii/3ii/3i  1 

V   'v  m.    (this    lambda    is    related,    but   not    equal,    to    that   appearing    in    section 
i       pi 

A).      The    robust   Hausman    test    is    easily   computed   by    first    orthogonalizing   A. 

with   respect   to   V„m. .      Let   E.    be    the   TxQ  matrix   residuals    from    the   matrix 
^  Pi  1 


regression 


A.      on     V^m.,      i-l,...,N.  (5.19) 

1  pi 


Then  compute  H  ^  U    -    SSR  from  the  regression 


I  on  u'.E.,   i-1 N;  (5.20) 

under  (5.5),  H  ~   Xq- 

Because  the  moment  assumptions  (5.1) -(5. 3)  encompass  HHG ' s  FENB  model,  a 
rejection  of  (5.5)  based  on  H   necessarily  implies  misspecif ication  of  the 
FENB  specification.   A  rejection  implies  some  failure  of  (5.1)-(5.3),  so  one 
needs  to  work  harder  in  specifying  E(y.  |x.  ,(?ii.  ,cp .)    and  V(y .  \x.  ,cf> .  ,(p.)  . 

If  the  Hausman  statistic  fails  to  reject  one  might  conclude  that 
the  first  two  moments  in  the  latent  variable  model  are  correctly  specified 
(this  assumes  that  the  Hausman  test  has  power  against  interesting  departures 
from  (5  . 1) - (5  .  3) )  .   If  the  QCMLE  estimates  are  reasonably  precise,  then  one 
could  stop  here.   However,  if  (2.1)  and  (2.2)  fail  to  hold,  the  QCMLEs  could 
(but  need  not)  have  large  standard  errors. 

Before  searching  for  a  more  efficient  estimator,  it  is  useful  to  have 
direct  evidence  concerning  the  appropriateness  of  the  multinomial 

distribution;  if  E(y.|n.,x.)  and  V(y.|n.,x.)  match  the  first  two  moments  of 

II  1         1 '  1   1 

the  Multinomial(n.  ,p  (x.  ,/3  ),..., p(x.,^  ))  distribution,  worthwhile 
efficiency  gains  over  the  multinomial  QCMLE  are  likely  to  be  difficult  to 
realize.   A  comparison  of  the  usual  and  robust  standard  errors  provides  some 
guidance.   A  more  formal  test  is  HHG ' s  serial  correlation  test  for  the  FEP 
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model.   However,  the  form  of  White's  (1982)  information  matrix  test  covered 
by  Wooldridge  (1991)  has  some  potential  advantages.   First,  it  imposes 
correctness  of  only  the  first  two  conditional  moments  (in  this  case 
E(y.|n.,x.)  and  V(y.|n.,x.))  under  H_ ,  but  it  is  asymptotically  equivalent  to 
nonrobust  forms  which  take  the  entire  distribution  to  be  correctly  specified. 
Second,  it  uses  an  estimate  of  the  expected  Hessian,  A^ ,  in  its  construction. 
Consequently,  this  test  probably  has  better  finite  sample  properties  than 
-HHG's  outer  product  test;  the  latter  is  known  to  reject  far  too  often  in  many 
situations . 

The  IM  test  for  the  multinomial  model  follows  from  Wooldridge  (1991, 
Procedure  4.1),  with  a  slight  modification  due  to  the  singularity  of 
E(u.u'.  |n.  ,x.  )  .   Let  Ci(n.  ,x.  ,B)    denote  the  TxT  covariance  matrix  of  the 

111    L  11 

Multinomial(n.  ,p  (x.  ,^)  ,  .  .  .  ,p  (x.  ,y9) )  distribution.   The  t   diagonal  element 
of  n(n.,x.,^)  is  n.p  (x.,/3)[l  -  p  (x.,^)]  while  the  (r,t)   element  is 

A  A 

-n^Pj.(x^,^)p^(x^,^).  Let  n^  ^  n(n^,x^,p^)    and 

A  AA  A  A  AA  AAA 

C-  =  {vec(u.u'.  -  n.))'[V.  ©V.l'^A.  -  s-aT/Jv-,.  (5.21) 

1         11     I'-i     i-'i     iNN 

where 

A  IN  A       A  A  A 

J„  =  N"^  y  V^C.  [V.  ®  V.  ]"^A. 
N      .^^    B    x'-    2.  1^    1 

1=1 

A  A 

is  P  X  Q,  V  n^(^)  ^  avec{n^(^))/a;9  is  T^  x  P,  and  A.  =  A.(x.,/3^)  is  a  T^  x  Q 

2      2 

matrix  of  selected  linearly  independent  columns  of  the  T  x  P  matrix 

A  A 

[V  m  ®  V  ra.].   The  IM  test  statistic  is  IM   =  NR^  =  N  -  SSR  from  the 
P    X  p    1.  u 

regression 

A 

1   on  |.,   i=l,...,N.  (5.22) 

Under  the  hypothesis  that  E(y.|n.,x.)  and  V(y.|n.,x.)  match  the  first  two 

1 '  1   1        -^  1 '  1   1 

moments  of  the  Multinomial(n.  ,p  (x.  ,/3  )  ,  .  .  .  ,p  (x.  ,y9  ))  distribution,  IM  ~   x^- 
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If  the  IM  test  rejects,  the  search  for  more  efficient  estimates  can 
proceed  along  two  lines.   First,  one  might  estimate  HHG's  FENB  model.   But 
HHG's  FENB  model  imposes  tp .    -  1  +  <^ .  and,  even  if  this  restriction  holds,  the 
FENB  QCMLE  apparently  does  not  enjoy  the  robustness  properties  of  the  FEP 
QCMLE.   Because  the  quasi-score  for  the  FENB  cannot  be  expressed  as  in 
(3.13),  the  FENB  QCMLE  is  generally  inconsistent  for  /9  unless  (2.8)  and 
(2.9)  hold  with  a.    =  1.   For  example,  if  the  FEP  model  (2.1)  and  (2.2)  holds, 
the  FENB  QCMLE  is  inconsistent  for  /3  .   Consequently,  the  FEP  QCMLE  is 
preferred  to  the  FENB  QCMLE. 

A  second  approach  is  to  construct  a  minimum  chi-square  estimator  that  is 
more  efficient  than  the  multinomial  QCMLE  but  is  nevertheless  consistent 
under  (5.1) -(5. 3).   There  are  a  variety  of  minimum  chi-square  estimators  that 
meet  these  criteria.   Here  I  cover  only  one  example,  namely  an  estimator  that 
combines  the  orthogonality  conditions  implied  by  QCMLE  and  MNLS .   In  the 

A 

notation  of  section  3,  Let  (i„  =   5.,  be  a  preliminary  consistent  estimator  of 

■  N    N 

fi      (typically  the  multinomial  QCMLE).   Define  z.  =    (l,n.)', 


L(x.,^^)  . 


^i^^^Vi^^N) 


Vi^^N^ 


(5.23) 


which  is  2T  x  2P,  and 


N 
i=l 


(5.24) 


which  is  2P  x  2P.   The  residual  function  u. (fl)  ^  y.  -  p.(fl)n.  can  be 
expressed  as  u.  (^)  =  y.  -  C.(^)z.,  where  C.  (^)  =  [0  |  p.(/3)]  is  T  x  2.   Note 
that  V  C^(y9)'  =  [0  I  V  p^(^)'],  which  is  P  x  2T. 

Denote  the  minimum  chi-square  estimator  by  /3  .  Then  the  asymptotic 

^  AAA 

variance  of  R      is  consistently  estimated  by  (E/H  R^)  /N ,  where 
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N  . 


R^,  -  N'^  y  L'.  [z.z'.  ®  I  ]V^C,  , 
1-1 


The  overidentification  test  statistic  Nr  (;9  )  (see  (4.6))  is  asymptotically 
X     under  (5.5),  and  provides  additional  evidence  on  the  appropriateness  of 
(5.1)-(5.3). 

This  section  is  concluded  with  a  robust  research  methodology  for  count 
panel  data  models: 

(i)  Estimate  model  (5.1) -(5. 3)  by  multinomial  QCMLE.   Compute  the  robust 

A 

standard  errors  for  /9„ . 

N 

(ii)  Compute  the  robust  Hausman  specification  test  as  in  (5.19)  and 
(5.20).   If  H   rejects,  conclude  that  (5.1)-(5.3)  is  misspecif led. 

(iii)  If  the  Hausman  test  in  (ii)  fails  to  reject,  compute  the 
information  matrix  test  as  in  (5.21)  and  (5.22).   If  the  IM  test  fails  to 
reject,  conclude  that  the  multinomial  distribution  adequately  describes 
E(y.|n.,x.)  and  V(y.|n.,x.).   The  efficiency  gains  from  minimum  chi-square 
estimation  are  unlikely  to  be  worthwhile. 

(iv)  If  the  IM  test  in  (iii)  rejects,  a  minimum  chi-square  procedure 
might  produce  noticeably  tighter  estimates.   Compute  the  overidentification 
test  statistic  as  further  evidence  on  model  specification. 

6.  Application  to  Gamma-type  Unobserved  Components  Models         ^ 

This  section  briefly  outlines  how  the  CLP  approach  can  be  applied  to 
models  where  the  conditional  variance  is  proportional  to  the  square  of  the 
conditional  mean.   Some  popular  continuous  distributions  for  nonnegative 
variables,  in  particular  the  lognormal  and  the  gamma,  have  first  two  moments 
corresponding  to  this  assumption.   For  t=l,...,T,  assume  that 
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E(yi,|x,.^,)  -^iM(x.^,^^)  (6.1) 

V(y^Jx.,^.)  -  -^[E(y,^|x..^.)]^  -  ol[<^.,(..^,fi^)f  (6.2) 

CV(y.  ,y.  Ix.  ,4,.)    -   0,       t   r'   r.  (6.3) 

It  -^  ir  1   1 

Assumption  (6.1)  is  essentially  the  same  as  (5.1),  while  (6.3)  corresponds  to 

(5.3).   Note  that  only  one  unobserved  effect,  4> .  ,    is  allowed.   I  do  not  know 

2 
how  to  allow  the  proportionality  parameter  a      to  vary  across  1.   Equation 

(6.2)  corresponds  to  what  statisticians  refer  to  as  a  "constant  coefficient 

of  variation"  model  (e.g.  McCullagh  and  Nelder  (1989,  chapter  8))  because  the 

ratio  of  the  standard  deviation  to  the  mean  is  constant: 

SD(y.  Jx...^.)/E(y.^|x.,^.)  =  a^.  (6.4) 

As  far  as  I  know,  there  has  been  no  work  analyzing  such  models  in  an   | 
unobserved  components,  panel  data  setting,  with  or  without  distributional 
assumptions.   This  is  probably  because,  when  y.   is  a  nonnegative  continuous 
random  variable,  mo»t  researchers  use  log(y.^)  in  a  linear  fixed  effects 
model.   But  if  interest  lies  in  E(y.  \>i.,4>.),    (6.1)-(6.3)  might  be  preferred; 
additional  assumptions  about  D(y.  |x.,i^.)  are  needed  to  recover  E(y.  \x.,4>.) 
from  E[log(y.^)|x.,^.]. 

The  conditioning  that  eliminates  the  individual  effect  d> .    is  more 

T 
restrictive  than  that  used  in  section  5.   Let  n.    =      T  Y.      be  as  before. 

t=l 

Then,  for  each  t,  the  linear  predictor  of  y.   on  n. ,  conditional  on 

■^  It     1 

(x^,(?i^),  is: 

E(y.  n.|x.,(^.) 

L(yiJn  ;x  ,^  )  =  '^  ^   '   '   n  (6.5) 

E(n^|x.,0.) 

But 

E(y.^n. |x.,^.)  =  CV(y.^,n.|x.,^.)  +  E(y . ^ | x . , ^ . ) E(n. | x. , ^ . ) 
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-  V(yitl'^i.^i)  -^  l^'^^'^it'^o^^^  ^  ^i'^^'^ir'^o)^ 


r-1 


-  4>\ 


r-1 


Similarly, 


E(m|x^.?i^)  -  V(n^|x.,^^)  +  [E(n.|x...^^) 


-  <i>^ 


r-1  r-1 


Therefore , 


L(y.  |n.;x.,ii.)  =   p  (x.,5  )n.  , 
■'it'  l'  I'^l^    ^t^  i'  o^  i' 


(6.6) 


where  B      =  (B'  ,a    )'  and 
o      o   o 


Pt^^'^)  - 


Note  that 


r=l 

r=l  r=l 


X  p  (x. ,ff)  =1  for  all  9  . 
t=l  ^   ■^ 


In  terms  of  the  vector  y. ,  (6.6)  is  expressed  as 


(6.7) 


L(y.  In.  ;x.  ,(ji. )  =  p(x.,5  )n.  , 

l'  1   11      '^   1   o   1 


(6.8) 


where  p(x^,n  =  (p^(x^,e) p^(x.,e))'. 

Although  the  right  hand  side  of  (6.8)  is  of  the  same  form  as  (5.5), 
there  is  an  important  difference.   Under  (6.1) -(6. 3),  (6.8)  does  not  also 


represent  L(y . 1 1 ,n. ; x. , i . ) .   In  fact,  the  CLP  of  y.  on  unity  and  n.  depends 
1     11   1  -^1        -^       1 


on  cf) .  ,  rendering  it  useless  for  estimating  a      and  B    .      Taking  z.  =  n.  i 
1  '^oo^ii 

2 

section  3  restricts  the  class  of  consistent  estimators  for  a      and  B    . 

o  o 

Nevertheless,  there  are  plenty  of  orthogonality  conditions  of  the  form 


in 
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E[Dix^J^,t*)'nAy^    -    P(x^.^^)nj)  -  0  (6.9) 

2 

to  identify  a      and  B    . 

Weighted  MNLS  estimators,  which  solve 

N 
min  I    ly   -  p(x.  ,  5  )n.  )' G(x.  .  7j^,)  (y^  -  p(x.,e)),  (6.10) 

5ee  1-1 

are  generally  consistent  and  asymptotically  normally  distributed.   Further, 

given  one  such  estimator,  it  is  straightforward  to  stack  WMNLS  orthogonality 

conditions  to  obtain  a  more  efficient  minimum  chi-square  estimator.   The 

robust,  regression-based  Hausman  test  for  comparing  two  WMNLS  estimators 

covered  by  Wooldridge  (1991)  is  a  special  case  of  the  tests  in  section  4,  and 

can  be  used  to  test  the  validity  of  (6.8). 

7.  Models  with  Serial  Correlation 

For  some  applications,  the  zero  covariance  assumptions  (2.12)  and  (6.3) 
might  be  too  restrictive  (although  recall  that  these  are  conditional  on 
latent  effects).   For  the  model  in  section  6,  it  is  straightforward  to  relax 
the  zero  covariance  assumption.   In  fact,  (6.2)  and  (6.3)  can  be  replaced 
with  the  general  assumption 

V(y.|x.,^^)  =  ^^Q(x.,5^),  (7.1) 

where  n(x.,5)  is  a  TxT  positive  definite  variance  function.   The  conditioning 
argument  used  in  section  6  still  eliminates  4> .  .      In  fact,  letting 

n  (x.  ,5)  denote  the  t"^  row  of  n(x.  ,5)  and  j~=  (1,1 1)',  L(y.  ln.;x.,i;6.) 

is  given  by  (6.6)  with 
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T 


r-1 


Model  (6.1)  and  (7.1)  allows  for  serial  correlation  and  a  variety  of  variance 
functions.   For  example,  a  gamma- type  model  with  constant  AR(1)  serial 
correlation  would  take 

n(x.,6^)  -  a^A(x^,y9^)R^(p^)A(x.,^^),  (7.3) 

where 

A(x.,^^)  =  diag{M(x.^,/3^),...,/i(x..j,,^^))  (7.4) 

and  R„(p)  is  the  TxT  matrix  with  (r,t)   element  p  .      This  model 

maintains  (6.2)  but  relaxes  (5.3)  to  CV(y.  ,y.  Ix.,^.)  — 

■^it  ■'ir'  11 

p   /i(x.^,/3  )m(x.  ,P    )  . 
o    It   o     ir   o 

Model  (6.1)  and  (7.1)  can  also  allow  for  serial  correlation  in 

count-type  models,  but  the  individual  dispersion  is  restricted  in  this  case. 

A  model  with  constant  AR(1)  serial  correlation  chooses  Q(x.,5  )  as  in  (7.3), 

1   o 


except  that 


A(x.,;3^)  ^  diag{[M(x.^,;9^)]''' [m(x.^,/3^)  ] ''')  .       (7.5) 


In  terms  of  model  (2  .  10)  -  (2 .  12)  ,  (2.11)  has  been  maintained  and  (2.12)  has 

2 

been  relaxed  at  the  cost  of  imposing  <p.  =  a   ch .     (compare  to  the  HHG  FENB 

°   1      0  1 

assumption  cp.  =!   +   <(>.). 

The  identification  issue  in  these  more  complicated  models  warrants  some 

2 

attention.   As  was  seen  in  section  2,  a      is  not  identified  if  p   =0,  in 

o  o 

which  case  cp.  is  free  to  vary  independently  of.   4> .  .      Also,  when  p      ^   0,    the 

2 

CLP  L(y.  In.  ;x.  ,ai.  )  must  be  used  to  estimate  B    ,    p    ,    and  a    ;  the  CLP 
1 '  1   11  '^o    o        o 
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L(y .  I  1  ,n  ;x.  ,<^  )  now  depends  on  <^   and  is  therefore  useless.   Thus,  the 
multinomial  QCMLE,  which  is  consistent  under  (2  .  10) - (2 . 12) ,  is  no  longer 
consistent  when  p   i"  0 .   A  weighted  least  squares  procedure  or  GMM  estimator 
must  be  used  instead  with  p  (x.,^  )  given  by  (7.2). 

8.  Concluding  Remarks 

This  paper  has  shown  how  the  notion  of  a  conditional  linear  predictor 
can  be  used  to  eliminate  individual  components  in  certain  classes  of 
multiplicative  unobserved  effects  models.   This  technique  can  be  viewed  as  a 
particular  implementation  of  the  general  approach  suggested  by  Neyman  and 
Scott  (1948)  for  obtaining  consistent  estimates  of  fixed  dimensional 

parameters  in  the  presence  of  an  infinite  dimensional  nuisance  parameter. 

I 

The  first  two  moments  of  the  count  model  in  section  5  should  be  general 
enough  for  many  applications.   A  corollary  of  the  analysis  is  that  the 
multinomial  QCMLE  has  important  robustness  properties,  and  can  be  used  to 
consistently  estimate  the  parameters  of  a  fairly  general  mean  and  variance 
function.   Nevertheless,  obtaining  minimum  chi-square  estimates  could  result 
in  efficiency  gains.   The  model  of  section  6,  intended  for  continuous, 
nonnegative  variables,  can  be  used  in  place  of  the  usual  practice  of  taking 
natural  logs  and  postulating  a  linear  model. 

The  models  in  sections  5,  6,  and  7  assume  that  x.  is  strictly  exogenous 

conditional  on  the  latent  variable  or  variables .   This  rules  out  feedback 

from  y.^  to  x.  ,  r  >  t  (i.e.,  {y.  )  cannot  Granger-cause  (x.  )).   While  this 
■'it     ir  -^it  ^  It 

is  natural  for  certain  explanatory  variables,  it  is  difficult  to  justify  in 
general.   For  example,  in  HHG's  patents -R&D  application,  the  number  of 
patents  awarded  in  one  year  could  affect  subsequent  R&D  expenditures.   If  so, 
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all  of  the  estimators  considered  in  section  5  are  generally  inconsistent. 
Future  research  could  usefully  investigate  how  to  relax  the  strict  exogeneity 
assumption  in  nonlinear  unobserved  components  models. 

Finally,  conditional  linear  predictors  can  also  be  used  to  robustly 
estimate  multiplicative  unobserved  components  models  for  multivariate  time 
series.   For  example,  suppose  that  { (y  ,x  ) : t-1 , 2 , . . . )  is  a  vector  ti 


:ime 


series,  with  y  a  Jxl  vector  of  counts  and  x  a  Kxl  vector  of  conditionine 
variables.   A  multiplicative  unobserved  components  model  might  specify  an 
analog  of  (2.1)  and  (2.2):   for  j-1 J, 

y  .Ix  ,4>      -  Poisson((jii  u.(x  , j3   ))  (8.1) 

y  .  ,y  ,  are  independent,  conditional  on  x  ,(^  ,  j  ?^  h.         (8.2) 

Any  dependence  between  y  .  and  y   is  due  entirely  to  the  unobserved  (or 
"common")  component  (f>    .       If  (8.1)  and  (8.2)  hold,  the  conditioning  argument 
used  in  section  2  can  be  used  to  elimimate  4>    .      Then  /3   can  be  estimated  by 
CMLE  (although  the  score  of  the  log-likelihood  would  not  necessarily  be  a 
martingale  difference  sequence).   Alternatively,  the  moment  assumptions 

E(y,j|x^.^,,^,)  =^^Mj(x^,/3„)  (8.3) 

V(y,.|x^,^,,<P,)  -  cp^<P^,.i.^,P^)  (8.4) 

CV(y^j,y^^|x^,^^,VP^)  =  0,  j  ^  h  (8.5) 

J 

can  be  used.   If  n   s   Z  y  ■  .  then  the  linear  predictor  of  y   on  (l,n  )'  , 

t    .    1 1  t        t 

conditional  on  (x      4>    ,(p  ),  eliminates  the  unobservables  <l>      and  tp    ,    as  before. 

Generally,  unless  (8.3) -(8.5)  represent  completely  specified  dynamics,  the 

theory  of  section  3  must  be  extended  to  allow  the  score  to  be  serially 

correlated.   But  this  is  no  more  difficult  than  QCMLE  or  GMM  with 

incompletely  specified  dynamics. 
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Endnotes 

1.  The  term  "conditional  maximuin  likelihood"  is  somewhat  unfortunate  in  light 
of  modern  econometrics.   This  is  because  virtually  all  maximum  likelihood  is 
conditional  on  a  set  of  explanatory  variables.   The  phrase  "conditional 
maximuin  likelihood"  appeared  in  the  statistics  literature  where  explanatory 
variables  are  treated  as  nonrandom.   Thus,  one  specifies  an  unconditional 
joint  distribution  for  the  explained  variables  (which  depends  on  the 
explanatory  variables) ,  and  then  conditions  on  a  function  of  the  explained 
variables.   Because  it  is  too  late  to  change  the  terminology,  I  stick  to 
standard  usage;  the  term  conditional  MLE  is  reserved  for  the  case  when  a 
function  of  the  explained  variables  is  conditioned  on. 

2.  For  linear  models,  Chamberlain  (1982)  uses  a  linear  projection  argument, 
which  imposes  no  restrictions  on  the  distribution  of  4> .    given  x.  .   Due  to  the 
nonlinear  nature  of  the  current  models,  this  approach  is  unavailable. 

3.  If  the  asymptotic  variance  of  ■/N(5„  -  ^  )  is  V  ,  it  is  natural  to  define 

•^  N     o       o 

A.  A  A 

V  /N  to  be  the  asymptotic  variance  of  B    ,    denoted  AV(5  ).   If  V   is  a 

A  A 

consistent  estimator  of  V  ,  then  V„/N  is  said  to  be  an  estimator  of  AV(5„). 

o        N  N 

4.  HHG  introduce  two  unobserved  effects,  which  they  label  ^i.    and  4> .  . 
However,  it  is  easily  seen  from  their  equations  (HHG  (1984,  p.  924))  that 
their  FENB  model  is  equivalent  to  (2.8)-(2.9)  with  a.    =   1. 
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